Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data
نویسندگان
چکیده
Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dispersion is high, rather than for low-count genes. Contact: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]. Supplementary Information: The computational tools developed for this study are freely available via our website http://dmrocke.ucdavis.edu/software.html. They can be downloaded as R code or run directly through an interactive web-based shiny application to reproduce the analysis presented here per a user’s choice of dataset and the methods to be evaluated. *To whom correspondence should be addressed.
منابع مشابه
Regulatory effects of cis- and trans-LncRNAs on differential expression of genes following infection with viral hemorrhagic septicemia virus in rainbow trout (Oncorhynchus mykiss)
In this study the cis and trans regulatory effect of long non-coding genes (lncRNA) on the expression of genes in fish infected by Viral hemorrhagic septicemia virus (VHS) was investigated using RNA-seq technology. At the end of experimental period (the thirty fifth day), total RNA was extracted from spleen tissue (group treated with virus) and physiological serum (control group) was used to pr...
متن کاملControlling False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data
We review existing methods for the analysis of RNA-Seq data and place them in a common framework of a sequence of tasks that are usually part of the process. We show that many existing methods produce large numbers of false positives in cases where the null hypothesis is true by construction and where actual data from RNA-Seq studies are used, as opposed to simulations that make specific assump...
متن کاملGene Expression Profile Analysis during Mouse Tooth Development
Introduction: Complex molecular pathways involve in development of different tissues such as teeth. Differential gene expression patterns during teeth development generates different tooth types. Teeth development results from interactions between oral epithelium and underlying ectomesenchyme cells with neural crest origin. Teeth development are regulated by different signaling networks. In thi...
متن کاملDifferential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization
To improve the applicability of RNA-seq technology, a large number of RNA-seq data analysis methods and correction algorithms have been developed. Although these new methods and algorithms have steadily improved transcriptome analysis, greater prediction accuracy is needed to better guide experimental designs with computational results. In this study, a new tool for the identification of differ...
متن کاملA comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species
RNA-Seq has emerged as a revolutionary technology for transcriptome analysis. In this article, we report a systematic comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. On a panel of human/chimpanzee/rhesus cerebellum RNA samples previously examined by the high-density human exon junction array (HJAY) and real-time qPCR,...
متن کامل